Lecturer: Dr. Veng Sotheara
Why Convert to Frequencies?
In pixel form, noise and detail are mixed together — impossible to tell apart. In frequency form, they're automatically separated.
This lets us remove noise cleanly without destroying the actual image — something we can't do pixel by pixel.
The Fast Fourier Transform (FFT) is a clever algorithm invented by Cooley and Tukey in 1965. It converts a signal — like a noisy image — from the regular "what does it look like?" world into the frequency world, where you can see which parts are smooth and which parts are noisy.
Why do we care? Because noise lives at high frequencies, and the real image content lives at low frequencies. Once we separate them, we can remove the noise and put the image back together, cleaner than before.
Imagine a band is playing music, but a loud fan is making a buzzing noise. If you use an equalizer, you can turn down the high-frequency "buzz" without muting the music. FFT does exactly this — but for images instead of sound.
Why is it "Fast"?
Without the trick, computing the Discrete Fourier Transform (DFT) takes O(N²) steps. For a 1000×1000 image that's 1 billion operations. FFT reduces this to O(N log N) — roughly 10 million operations — a 100× speedup. That's why it changed the world of digital signal processing.
Step 1 — Think of an image as a grid of numbers.
Every pixel has a brightness value (0 = black, 255 = white). An image is just a 2D grid of these numbers.
A 256×256 image = 65,536 numbers. FFT analyzes all of them to ask: "What frequencies make up this image?"
Step 2 — The DFT formula
The 1D Discrete Fourier Transform converts a sequence of values into frequency components:
Think of the DFT like a recipe decoder. Given a cake (the image), it tells you exactly how much flour (low freq), sugar (mid freq), and salt (high freq / noise) are in it. Once you know the amounts, you can adjust them.
Step 3 — Extending to 2D (Images)
For images, we run the 1D FFT on every row first, then on every column. The result is a 2D spectrum where:
Low frequency: A plain blue sky in a photo — barely any variation. Appears as a bright spot at the center of the FFT spectrum.
High frequency: A chain-link fence — lots of rapid changes. Appears far from the center in the spectrum.
Noise: Random pixel variations — they spread energy everywhere in the spectrum, especially at high frequencies.
Computation advantage for images:
Regular 2D DFT = O(N⁴) operations. With FFT = O(N² log N). For a 256×256 image: DFT needs 4 billion steps. FFT needs only ~500,000. FFT is ~8,000× faster!
Here is the complete pipeline used in this project. Think of it as a production line — the noisy image goes in one end and a clean image comes out the other.
Convert from pixel-space to frequency-space using np.fft.fft2(). Each pixel's brightness becomes a mix of frequency contributions.
Normally, low frequencies land at the corners. np.fft.fftshift() reorganizes the spectrum so low frequencies are in the middle — easier to work with visually and mathematically.
Choose a filter type (Ideal, Gaussian, Butterworth) and a cutoff frequency. The mask is a 2D array of the same size as the image — values near 1 keep frequencies, values near 0 block them. The cutoff controls the trade-off between noise removal and detail preservation.
Element-wise multiplication applies the filter. Frequencies the mask keeps (≈1) pass through; frequencies it blocks (≈0) are suppressed. This is where noise is removed.
Use np.fft.ifftshift() to move the zero-frequency back to the corners. This must happen before the inverse FFT — skipping it produces a spatially incorrect result.
Use np.fft.ifft2() to convert back from frequency-space to pixel-space. The result is a complex-valued array.
Use np.real() to discard the tiny imaginary residuals left by floating-point math. Only the real values represent actual pixel intensities.
Floating-point arithmetic can push values slightly outside the valid pixel range. Use np.clip(result, 0, 255).astype(np.uint8) to clamp values and cast back to an 8-bit integer image — ready for display or saving.
Imagine pouring coffee through a filter. The filter catches the grounds (noise) while the liquid (real image content) passes through. The FFT filter mask is exactly like that — but for frequencies.
Two types of low-pass filters were designed and compared. A low-pass filter lets low frequencies pass (keeps smooth image content) and blocks high frequencies (removes noise).
You know how when you cut a piece of paper very sharply it sometimes leaves a rough edge? The Ideal Filter does the same thing in frequency space — its sharp cutoff creates wavy ripples around edges in the final image. This is the Gibbs phenomenon.
The Gaussian Filter avoids this because it gradually reduces frequencies instead of abruptly cutting them off — like slowly fading a song rather than stopping it instantly.
What's a "cutoff frequency"?
It's the radius of your filter circle. A smaller cutoff = more aggressive filtering = smoother but blurrier image. A larger cutoff = less filtering = more detail preserved but more noise remains.
In this project, cutoff values from 5 to 115 were tested. The sweet spot was found at cutoff = 45 — best balance between removing noise and keeping detail.
How we measure quality: PSNR (Peak Signal-to-Noise Ratio)
PSNR tells you how close the denoised image is to the original clean image. Higher PSNR = better quality.
Below 20 dB → Image looks terrible, lots of noise visible.
20–30 dB → Acceptable quality for many applications.
Above 30 dB → Very good quality, noise mostly invisible to human eye.
Our project took a noisy image at 17.65 dB and improved it to 23.28 dB. That's a meaningful improvement — the image went from "visibly noisy" to "acceptable quality."
Key insight from the cutoff analysis:
As the cutoff increases from 5 to 45, PSNR goes up (we're keeping more of the real image). Past 45, PSNR drops because we start letting noise back in. The optimal cutoff at 45 is the "Goldilocks zone" — not too much filtering, not too little.
The same FFT framework can do the opposite of denoising — sharpening. Instead of a low-pass filter (remove high frequencies), we use a high-pass filter to keep only the high frequencies, which are the edges and fine details.
Sharpened = Original + α × High-Frequency Components
α controls how strong the sharpening is. A small α adds a subtle crispness. A large α creates very dramatic edge enhancement (but can look unnatural).
No technique is perfect. Here's where FFT-based denoising struggles:
FFT filters affect the entire image uniformly — cannot adapt to local regions where noise or detail varies.
Example: A portrait photo where the face is noisy but the background is already sharp — the filter blurs both equally, softening background detail you wanted to keep.
Works best for additive Gaussian noise. Structured noise (salt-and-pepper, periodic) needs specialized handling.
Example: A scanned document with regular horizontal lines (periodic noise from the scanner) — FFT low-pass won't remove them because they sit at specific frequencies that need a notch filter, not a low-pass filter.
Aggressive filtering removes noise but also blurs genuine edges and fine textures. No single cutoff is perfect everywhere.
Example: At cutoff = 10 the image looks clean but text becomes unreadable and edges dissolve. At cutoff = 80 edges are sharp but noise is still clearly visible. Neither extreme is acceptable.
Optimal cutoff frequency depends on the specific image and noise level — not automatically adaptive.
Example: A lightly noisy photo may need cutoff = 80 to preserve fine detail, while a heavily corrupted image might need cutoff = 30. There is no universal formula — you test, measure PSNR, and compare.
Future improvements mentioned in the report:
Adaptive filters that vary the cutoff spatially (different settings for different regions), wavelet-based denoising for better edge preservation, and deep learning approaches that use FFT as a feature extraction step.
Click each question to reveal the answer. Try to guess before you look! 🧠